Skip to content

Conversation

@peterdudfield
Copy link
Contributor

@peterdudfield peterdudfield commented Dec 17, 2025

Pull Request

Description

Move uk-national over to this quartz-api

  • National routes
  • GSP routes
  • status routes - this loads from the old database for the moment. This will change in the future
  • system routes

Currently

  • forecast/all takes about ~13 seconds, which is to slow (benchmark from old app was). We currently looping over ForecastAtTimestamp
  • pvlive/all takes about ~10 seconds

#147

How Has This Been Tested?

  • CI tests
  • ran locally, and UAT from airflow run against the API
  • deployed and ran on aws,
  • and UAT from airflow run against the API
  • deploy UI to use this API - link

dont do in this PR: add end to end test for uk-national with the app #179
dont do in this PR: add seperate UAT in airflow that tests it

Checklist:

  • My code follows OCF's coding style guidelines
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked my code and corrected any misspellings

…nal-and-gsp-locations

# Conflicts:
#	src/quartz_api/cmd/main.py
#	src/quartz_api/internal/backends/dataplatform/client.py
#	src/quartz_api/internal/models.py
…nal-and-gsp-framework

# Conflicts:
#	src/quartz_api/cmd/main.py
…-locations

# Conflicts:
#	src/quartz_api/cmd/main.py
#	src/quartz_api/internal/service/uk_national/gsp.py
#	src/quartz_api/internal/service/uk_national/national.py
#	src/quartz_api/internal/service/uk_national/system.py
# Conflicts:
#	src/quartz_api/internal/service/uk_national/national.py
# Conflicts:
#	src/quartz_api/internal/service/uk_national/national.py
# Conflicts:
#	src/quartz_api/internal/service/uk_national/gsp.py
#	src/quartz_api/internal/service/uk_national/national.py
#	src/quartz_api/internal/service/uk_national/system.py
# Conflicts:
#	src/quartz_api/internal/backends/dataplatform/client.py
repr(sorted(params)),
])

log.info(f"Cache key generated: {key}")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably remove

# Conflicts:
#	src/quartz_api/internal/backends/dataplatform/client.py
#	src/quartz_api/internal/models/endpoint_types.py
Copy link
Contributor

@devsjc devsjc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a finished review, haven't got all the way down the files yet!

Comment on lines 8 to 20
def get_window(
start: datetime | None = None, end:datetime | None = None,
) -> tuple[datetime, datetime]:
"""Returns the start and end of the window for timeseries data."""
# Window start is the beginning of the day two days ago
start = (dt.datetime.now(tz=dt.UTC) - dt.timedelta(days=2)).replace(
hour=0,
minute=0,
second=0,
microsecond=0,
)
if start is None:
start = (datetime.now(tz=UTC) - timedelta(days=2))
start = floor_6_hours_dt(start)

# Window end is the beginning of the day two days ahead
end = (dt.datetime.now(tz=dt.UTC) + dt.timedelta(days=2)).replace(
hour=0,
minute=0,
second=0,
microsecond=0,
)
if end is None:
end = start + timedelta(days=4)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be tempted to get rid of this altogether and just mandate that all the timeseries routes have a start and an end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, somewhere the logic has to sit, as people pull API with no start or end datetimes in current uk-national api, so we have to make them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible with FastAPI to just have them in the routes but make them optional such that they default to current time +/- 2 days or whatever it's meant to be?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, there is probably a way to do that. So move it from the backend to the routes (and specifically the uk-national and ruvnl region routes)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say so - would save having to have this file at least!

model_name: str | None = None,
start_datetime: datetime | None = None,
end_datetime: datetime | None = None,
created_utc_upper_limit: datetime | None = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is created_before_datetime more in line with the other fields?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, im happy to change it to that

Comment on lines +147 to +155
class Region(BaseModel):
"""Region metadata."""

region_name: str = Field(..., json_schema_extra={"description": "The name of the region."})
region_metadata: dict | None = Field(
None,
json_schema_extra={"description": "Additional metadata about the region."},
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this not be a subclass of LocationPropertiesBase?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do, does LocationPropertiesBase force a lat/lon where as a Region doesnt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything has a lat/lon, even regions, as they have a centroid/associated point. So it's not a problem for region to have a lat/lon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually tempted to leave this, its just been moved from below, and tidy this up another time.

Comment on lines +11 to +17
func: Callable, # noqa
namespace: str = "",
*,
request: Request = None,
response: Response = None, # noqa
args, # noqa
kwargs, # noqa
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need all these #noqa's? You have some type hinting - also you never use args or kwargs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you need a #noqa per line, but i could be wrong.

I look into removing the ones I dont need, but i might be the cache function needs certain inputs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do need it per line, but what I meant was, are they doing anything? If you remove the unused arguments does something complain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ill check, it might be the cache function has to have them, but ill check

log = logging.getLogger(__name__)


async def key_builder(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo add permission in cache key, for intraday users

) -> list[models.PredictedPower]:
values = await self._get_predicted_power_production_for_location(
location_uuid=UUID(location),
location_uuid=location,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be safer to make the location input to the function a UUID type? Then we can enforce it at the route validation level

forecast_horizon_minutes=forecast_horizon_minutes,
smooth_flag=smooth_flag,
oauth_id=None,
model_name=model_name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use forecaster now (or try to!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok i will change that

Comment on lines +338 to +369
async def get_forecast_metadata(
self,
location_uuid: str,
authdata: dict[str, str], #noqa: ARG002
model_name: str | None = None,
) -> models.ForecastMetadata:
"""Get forecast metadata for a site."""
req = dp.GetLatestForecastsRequest(
location_uuid=location_uuid,
energy_source=dp.EnergySource.SOLAR,
)
resp = await self.dp_client.get_latest_forecasts(req)

# Filter by model name if provided
if model_name:
resp.forecasts = [
forecast for forecast in resp.forecasts
if forecast.forecaster.forecaster_name == model_name
]

resp.forecasts.sort(
key=lambda f: f.created_timestamp_utc,
reverse=True,
)
forecasts = resp.forecasts[0]

return models.ForecastMetadata(
initialization_timestamp_utc=forecasts.initialization_timestamp_utc,
created_timestamp_utc=forecasts.created_timestamp_utc,
forecaster_name=forecasts.forecaster.forecaster_name,
forecaster_version=forecasts.forecaster.forecaster_version,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire function just to return an init_timestamp of the latest forecast seems unecessary and ripe for overuse; surely everything that wants this can just get it as part of their response for their own query?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its also use in the national route where we return metadata, so i think we need all this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my other comments I tried to describe that the metadata that is fetched by this route is already surfaced by the Data Platform's GetForecastAsTimeseries route. So all that has to be done is for that data to be surfaced in the _get_solar_power_production_for_timestamp DBClient function and it can be passed into the output of the national route without making any extra calls to the database.

Comment on lines +578 to +704
@override
async def get_forecast_for_multiple_locations(
self,
location_uuids_to_location_ids: dict[str, int],
authdata: dict[str, str],
start_datetime_utc: dt.datetime | None = None,
end_datetime_utc: dt.datetime | None = None,
model_name: str | None = None,
) -> list[models.OneDatetimeManyForecastValuesMW,
]:
"""Get a forecast for multiple sites.
Args:
location_uuids_to_location_ids: A mapping from location UUIDs to location IDs.
authdata: Authentication data for the user.
start_datetime_utc: The start datetime for the prediction window. Default is None.
end_datetime_utc: The end datetime for the prediction window. Default is None.
model_name: The name of the forecasting model to use. Default is None.
Returns:
A list of OneDatetimeManyForecastValuesMW objects.
"""
start, end = get_window(start=start_datetime_utc, end=end_datetime_utc)

# timestamps 30 mins apart from start to end
n_half_hours = int((((end - start).total_seconds() // 60) // 30) + 1)
timestamps = [start + dt.timedelta(minutes=30 * x) for x in range(n_half_hours)]

# get forecasters
req = dp.ListForecastersRequest(forecaster_names_filter=[model_name],
latest_versions_only=True)
resp = await self.dp_client.list_forecasters(req)
forecaster = resp.forecasters[0]

forecasts_per_timestamp = []
tasks = []
for timestamp in timestamps:
req = dp.GetForecastAtTimestampRequest(
location_uuids=list(location_uuids_to_location_ids.keys()),
energy_source=dp.EnergySource.SOLAR,
timestamp_utc=timestamp,
forecaster=forecaster,
)
# resp = await self.dp_client.get_forecast_at_timestamp(req)
task = asyncio.create_task(self.dp_client.get_forecast_at_timestamp(req))
tasks.append(task)
list_results = await asyncio.gather(*tasks, return_exceptions=True)
for exc in filter(lambda x: isinstance(x, Exception), list_results):
raise exc

for resp in list_results:

if len(resp.values) == 0:
continue

forecasts_one_timestamp = models.OneDatetimeManyForecastValuesMW(
datetime_utc=resp.timestamp_utc,
forecast_values={
location_uuids_to_location_ids[forecast.location_uuid]: round(
forecast.value_fraction * forecast.effective_capacity_watts / 10**6, 2)
for forecast in resp.values
})

# sort by dictionary by keys
forecasts_one_timestamp.forecast_values =\
dict(sorted(forecasts_one_timestamp.forecast_values.items()))

forecasts_per_timestamp.append(forecasts_one_timestamp)

return forecasts_per_timestamp

@override
async def get_generation_for_multiple_locations(
self,
location_uuids_to_location_ids: dict[str, int],
authdata: dict[str, str],
start_datetime: dt.datetime | None = None,
end_datetime: dt.datetime | None = None,
observer_name: str = "ruvnl",
) -> list[models.GSPYieldGroupByDatetime]:
"""Get a forecast for multiple sites."""
start, end = get_window(start=start_datetime, end=end_datetime)

tasks = []
for location_uuid in location_uuids_to_location_ids:
req = dp.GetObservationsAsTimeseriesRequest(
location_uuid=location_uuid,
observer_name=observer_name,
energy_source=dp.EnergySource.SOLAR,
time_window=dp.TimeWindow(
start_timestamp_utc=start,
end_timestamp_utc=end,
),
)
task = asyncio.create_task(self.dp_client.get_observations_as_timeseries(req))
tasks.append(task)
# observation = await self.dp_client.get_observations_as_timeseries(req)
# observations.append(observation)

list_results = await asyncio.gather(*tasks, return_exceptions=True)
for exc in filter(lambda x: isinstance(x, Exception), list_results):
raise exc

# Combine results into GSPYieldGroupByDatetime
observations_by_datetime = {}
for observation in list_results:

location_id = location_uuids_to_location_ids[observation.location_uuid]

for value in observation.values:
timestamp = value.timestamp_utc
if timestamp not in observations_by_datetime:
# make a dictionary generation_kw_by_gsp_id
observations_by_datetime[timestamp] = {}

generation_kw = int(value.effective_capacity_watts * value.value_fraction / 1000.0)
observations_by_datetime[timestamp][location_id] = generation_kw

# format to list of GSPYieldGroupByDatetime
observations_by_datetime_formated = [
models.GSPYieldGroupByDatetime(
datetime_utc=timestamp,
generation_kw_by_gsp_id=dict(sorted(generation_kw_by_gsp_id.items())),
)
for timestamp, generation_kw_by_gsp_id in observations_by_datetime.items()
]
return observations_by_datetime_formated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think either of these functions should exist. I'm aware they might have to for this specific PR, but it goes against the desiagn of the data platform to do these massive 2D calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should i put it in the route then? and then its less likely to be re-used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's possibly sensible actually

Comment on lines +222 to +224
@override
async def get_forecast_metadata() -> models.ForecastMetadata:
raise NotImplementedError()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might not be needed

@@ -1,22 +1,41 @@
"""Utility functions..."""

import datetime as dt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why modify this? It's now different to every other file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can change this back.

resp = await self.dp_client.list_forecasters(req, metadata={"traceid": traceid})
forecaster = resp.forecasters[0]

req = dp.GetForecastAsTimeseriesRequest(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The response to this returns the created_at and init_time of the forecast that produced each value in the timeseries. If we surfaced this into the models.PredictedPower dataclass, there would be no need for the get_forecast_metadata route.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, thats one way to do it.

The tricky thing is, sometimes we want this metadata, and sometimes we dont. The real stripped down version should not get it, we should just get a timeseries. This is like an option extra, that users can request

Copy link
Contributor

@devsjc devsjc Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then you already have it regardless, so you might as well pass it up to the top - and then put the conditional solely on writing it back to the user at the very end of the route logic. There is no "stripping down" to be achieved by not passing it through; the data platform returns it regardless every time. In fact, all the "stripped down" version does at the moment is not do the extra (unecessary) call to the database via get_forecast_metadata (so really it's the other version that's "stripped up"!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so does there need to be a change in data-platform for this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the data platform is already returning the init time and created time alongside every forecast value when you call GetForecastAsTimeseries. All that has to be done is for the DBClient's _get_predicted_solar_power_for_location function to actually propogate these values up the stack in their returned dataclasses. The National route can then either show or not show those values based on user preference.

Comment on lines +61 to +65
forecast = await db.get_forecast_metadata(
location_uuid=national_location_uuid,
authdata={},
model_name=model_name,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should call a more generic function that just gets the latests forecasts and then filter the output of that by name. I think a DBClient function like that would be more generally useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by DBClient function here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in, and extra route on the DBClient interface that is get_latest_forecasts or something similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah i see, so its basically renaming this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, renaming it, and changing it's function. From what I can tell, what this function actually does in it's current state is

  • get the latest forecast to run for the given location (and optional forecaster name)
  • return the initilisation time of that forecast

So get_latest_forecast_init_time_for_location is a more accurate name, but, as I've described, you don't need a new db call for this, as that metadata is already returned when you get a forecast by the data platform. As such I think this function should be completely overhauled to just be a more generic get_latest_forecasts function.

Comment on lines +110 to +114
if include_metadata:
forecast_metadata: ForecastMetadata \
= await db.get_forecast_metadata(location_uuid=national_location_uuid,
model_name=model_name,
authdata=auth)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole thing could be removed as the value can be got from the get_predicted_solar_power_production_for_location call below (if the previoulsy mentioned changes are made)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants